The Allotrope Data Format (ADF) [[!ADF]] consists of several APIs and taxonomies. This document provides a Developer's Guide to the Allotrope Audit Trail API (ADF-AUDIT) [[!ADF-AUDIT]] for storing raw analytical data. It introduces the ADF-AUDIT API and illustrates it by code examples. This API uses classes and properties that are based on their definition in the ADF Audit Trail Ontology [[!ADF-AUDIT-Ontology]] and others like W3C Provenance Ontology [[!PROV-O]] and the Ontology of Provenance and Versioning [[!PAV]].
THESE MATERIALS ARE PROVIDED "AS IS" AND ALLOTROPE EXPRESSLY DISCLAIMS ALL WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, INCLUDING, WITHOUT LIMITATION, THE WARRANTIES OF NON-INFRINGEMENT, TITLE, MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
This document is part of a set of specifications on the Allotrope Data Format [[!ADF]]
The Allotrope Data Format (ADF) defines an interface for storing scientific observations from analytical chemistry. It is intended for long-term stability of archived analytical data and fast real-time access to it. The ADF Data Cube API (ADF-DC) [[!ADF-DC]] defines an interface for storing raw analytical data. ADF-AUDIT-API uses classes derived from the
The document is structured as follows: First, activating the audit trail on an ADF file is shown. The ADF file is now under under audit trail and changes on it are now tracked. Second, it is shown how these changes can be read, together with the information in the audit record of who made the change, when the was change done, and why it has been done. Then, a way to provide an audit trail from an external source is shown.
Within this document, the following namespace prefix bindings are used:
Prefix | Namespace |
---|---|
owl: | http://www.w3.org/2002/07/owl# |
rdf: | http://www.w3.org/1999/02/22-rdf-syntax-ns# |
rdfs: | http://www.w3.org/2000/01/rdf-schema# |
xsd: | http://www.w3.org/2001/XMLSchema# |
skos: | http://www.w3.org/2004/02/skos/core# |
dct: | http://purl.org/dc/terms/ |
adf-audit: | http://purl.allotrope.org/ontologies/audit# |
adf-dc: | http://purl.allotrope.org/ontologies/datacube# |
adf-dp: | http://purl.allotrope.org/ontologies/datapackage# |
foaf: | http://xmlns.com/foaf/0.1/ |
org: | http://www.w3.org/ns/org#> |
prov: | http://www.w3.org/ns/prov# |
pav: | http://purl.org/pav/ |
ex: | http://example.com/ns# |
Within this document, decimal numbers will use a dot "." as the decimal mark.
This section introduces the core operations of the ADF-AUDIT-API and illustrates these by examples. The core operations of the ADF-DC API are activating audit trail, starting a new audit record, and committing the changes, as well as reading the audit trail later on for its meta data and the actual changes done.
The main entry point to the Audit Trail API is the interface AdfAuditTrailService
.
Given an ADF file adfFile of type AdfFile
, an instance of this service may be retrieved as follows:
Java and C#:
DataCubeService dataCubeService = adfFile.getAuditTrailService();
The audit trail feature is activated on an ADF file by calling
Java and C#:
adfFile.activateAuditTrail();
After activating the audit trail, any changes that are done to the ADF file will be tracked
in an active audit record that is appended to the audit trail of the file. Any changes done to the
ADF file without having the audit record open will throw an exception. An audit record
can be opened by calling one of the following method on the AdfAuditTrailService
,
depending on the kind of change activity on the ADF file:
auditTrailService.startRevision()
- this starts a generic revision of some ADF file content; the others describe more specific activitiesauditTrailService.startOperation()
- this starts an operation that typically adds new data content to the ADF fileauditTrailService.startApproval()
- this starts an approval of some ADF file content that typically adds some approval result to the data descriptionauditTrailService.startReview()
- this starts a review of some ADF file content, adding metadata and maybe doing minor fixesauditTrailService.startCuration()
- this indicates some curation activity on the file contentAll these methods expect an agent, a text description of the motivation, and a reference to the software with which the change is applied. The choice of the method sets the role the agent plays in the revision of the ADF file.
All these methods return a ChangeCapture
object that acts as a handle to
the newly created audit record in the audit trail of the ADF file. Changes to the ADF file are now
recorded until commit()
is called on the change capture object. This closes and returns
the audit record, and the ADF file becomes again read-only until a new audit record is created.
Java and C#:
Agent operator = ...;
String motivation = "need to change something";
Entity software = ...;
ChangeCapture changeCapture = auditTrailService.startOperation(operator, motivation, software);
// change something in the ADF file
AuditRecord lastRecord = changeCapture.commit();
AuditRecord
, Agent
, and Entity
are shape classes that are
Java/C# representations of the OWL classes defined in the ADF-AUDIT and PROV ontologies. The audit record
is an RDF graph in a separate dataset of the internal ADF quad store. Using these classes simplifies
writing and reading the audit (meta) data from the RDF triples in the record. How to create shape classes
and how to read and write them to RDF graphs is explained in a
later section.
Note that the operation of closing the audit record is called commit()
. This does
not imply that the tracking of changes is any way transactional. Any change is actually committed immediately,
and there is no way to roll back a change with the audit trail API. If the audit record is not committed
using the ChangeCapture
object before the ADF file is closed - which might be intended in a long running change - the record remains open.
If an application now wants to make a different change that should be tracked in a new record, the API allows
to close the active audit record without having the ChangeCapture
object by using the
method forceCommit()
of the AdfAuditTrailService
. The application must state the reason for doing so,
and who did the forced commit.
Now that the audit record has been created and closed, we want to read the metadata on why, when, who, and what has been changed and retrieve the original data in the ADF file that has been replaced by new data. For the data description, this can mean that statements (triples) have been added or deleted, or named graphs have been added or deleted. For the data cube, this means that data has been appended to the data cube, data has been updated in the data cube, or whole new data cubes have been added or deleted. For the data package, this means that files or folders have been created or deleted, or data has been appended to a file. Note that overwriting existing data in the file in a random-access way is not supported by the data package API.
An audit record can be retrieved by iterating over the audit records that are attached to the audit trail.
Java:
AuditTrail auditTrail = auditTrailService.getAuditTrail();
for (AuditRecord auditRecord : auditTrail.auditRecords()) {
System.out.println("IRI of audit record: " + auditRecord.id());
System.out.println("date of audit record: " + auditRecord.created());
}
C#:
AuditTrail auditTrail = auditTrailService.getAuditTrail();
foreach (AuditRecord auditRecord in auditTrail.auditRecords()) {
System.Console.WriteLine("IRI of audit record: " + auditRecord.id());
System.Console.WriteLine("date of audit record: " + auditRecord.created());
}
The audit records returned for the audit trail in this way do not contain the content of the audit record in the
audit record dataset. The content must be retrieved separately as described later, or the
RDF dataset must be exported using auditTrailService.exportAuditRecordDataset()
for external analysis with SPARQL.
The last audit record can be directly read using auditTrailService.lastAuditRecord()
;
the active one can be obtained from the ChangeCapture
handle.
The metadata of an audit record contains three main parts coming from the [[!PROV-O]] ontology:
auditTrailService.getAuditRecordRevision()
,
that describes what has changedauditTrailService.getAuditRecordActivity()
,
that describes when and how the change has been doneauditTrailService.getAuditRecordAttributions()
,
that describe agents who attributed to the change.
Java:
Revision revision = auditTrailService.getAuditRecordRevision();
VersionedEntity revised = (VersionedEntity) revision.revised();
System.out.println("version of the revised entity: " + revised.version());
Activity activity = auditTrailService.getAuditRecordActivity();
System.out.println("revision started at: " + activity.startedAtTime());
System.out.println("revision ended at: " + activity.endedAtTime());
for (Attribution attribution = auditTrailService.getAuditRecordAttributions()) {
Agent agent = attribution.agent();
System.out.println("revision done by/with: " + attribution.agent.id());
}
C#:
Revision revision = auditTrailService.getAuditRecordRevision();
VersionedEntity revised = (VersionedEntity) revision.revised();
System.Console.WriteLine("version of the revised entity: " + revised.version());
Activity activity = auditTrailService.getAuditRecordActivity();
System.Console.WriteLine("revision started at: " + activity.startedAtTime());
System.Console.WriteLine("revision ended at: " + activity.endedAtTime());
foreach (Attribution attribution in auditTrailService.getAuditRecordAttributions()) {
Agent agent = attribution.agent();
System.Console.WriteLine("revision done by/with: " + attribution.agent.id());
}
The changes in the three parts of the ADF file can be read as follows:
auditTrailService.getAuditRecordDataCubeChanges()
auditTrailService.getAuditRecordDataPackageChanges()
auditTrailService.getAuditRecordDataDescriptionChanges()
ChangeSet
containing additions()
,
removals()
, and dataUpdates()
. What is added, removed, resp.
updated depends on what part of the ADF file was changed.
Java:
ChangeSet dcChanges = auditTrailService.getAuditRecordDataCubeChanges(auditRecordIri);
for (Resource addedResource : dcChanges.additions()) {
DataSet addedCube = (DataSet) addedResource;
System.out.println("data cube added: " + addedCube.id());
}
for (Resource removedCube : dcChanges.removals()) {
DataSet removedCube = (DataSet) removedResource;
System.out.println("data cube removed: " + removedCube.id());
}
for (DataUpdate update : dcChanges.updates()) {
Resource updatedCube = update.target();
System.out.println("data cube updated: " + target.id());
// selection of what was removed in the cube
DataSelection oldData = (DataSelection) update.oldDataReference();
// selection of what was added in the cube
DataSelection newData = (DataSelection) update.newDataReference();
}
C#:
ChangeSet dcChanges = auditTrailService.getAuditRecordDataCubeChanges(auditRecordIri);
foreach (Resource addedResource in dcChanges.additions()) {
DataSet addedCube = (DataSet) addedResource;
System.Console.WriteLine("data cube added: " + addedCube.id());
}
foreach (Resource removedResource in dcChanges.removals()) {
DataSet removedCube = (DataSet) removedResource;
System.Console.WriteLine("data cube removed: " + removedCube.id());
}
foreach (DataUpdate update in dcChanges.updates()) {
Resource updatedCube = update.target();
System.Console.WriteLine("data cube updated: " + target.id());
// selection of what was removed in the cube
DataSelection oldData = (DataSelection) update.oldDataReference();
// selection of what was added in the cube
DataSelection newData = (DataSelection) update.newDataReference();
}
The data selections returned in a data update can be used to retrieve the old data using
auditTrailService.getArchivedDatacubeData()
. The returned data is an array of the
size of the data selection. If a whole data cube has been deleted, the data selection is the full
selection of the deleted data cube. Any deleted/overwritten data of a cube is archived in a one-dimensionsal internal
archive dataset of the cube.
For data package changesets, the additions and removals are data package files or folders. A data update can only be done on a file, and currently this is always an append, so there is never existing data overwritten that needs to be archived.
Java:
ChangeSet dpChanges = auditTrailService.getAuditRecordDataPackageChanges(auditRecordIri);
for (Resource addedResource : dpChanges.additions()) {
if (addedResource instanceof File) {
File addedFile = (File) addedResource;
System.out.println("file added: " + addedFile.id());
} else if (addedResource instanceof Folder) {
Folder addedFolder = (Folder) addedResource;
System.out.println("folder added: " + addedFolder.id());
}
}
for (Resource removedResource : dpChanges.removals()) {
if (removedResource instanceof File) {
File removedFile = (File) removedResource;
System.out.println("file removed: " + removedFile.id());
} else if (removedResource instanceof Folder) {
Folder removedFolder = (Folder) removedResource;
System.out.println("folder removed: " + removedFolder.id());
}
}
for (DataUpdate update : dpChanges.updates()) {
Resource updatedFile = update.target();
System.out.println("file updated: " + target.id()); // always a file, never a folder
Segment newData = (Segment) update.newDataReference();
}
C#:
ChangeSet dpChanges = auditTrailService.getAuditRecordDataPackageChanges(auditRecordIri);
foreach (Resource addedResource in dpChanges.additions()) {
if (addedResource is File) {
File addedFile = (File) addedResource;
System.Console.WriteLine("file added: " + addedFile.id());
} else if (addedResource is Folder) {
Folder addedFolder = (Folder) addedResource;
System.Console.WriteLine("folder added: " + addedFolder.id());
}
}
foreach (Resource removedResource in dpChanges.removals()) {
if (removedResource is File) {
File removedFile = (File) removedResource;
System.Console.WriteLine("file removed: " + removedFile.id());
} else if (removedResource is Folder) {
Folder removedFolder = (Folder) removedResource;
System.Console.WriteLine("folder removed: " + removedFolder.id());
}
}
foreach (DataUpdate update in dpChanges.updates()) {
Resource updatedFile = update.target();
System.Console.WriteLine("file updated: " + target.id()); // always a file, never a folder
Segment newData = (Segment) update.newDataReference();
}
The deleted file can be retrieved from the archive using auditTrailService.openArchivedFile()
.
For data description changesets, the additions and removals are named graphs. A data update on the data description is the addition and removal of statements (triples) to/from a named graph, usually the default graph. All added/removed statements are collected in a separate named graph. An update of a single statement is always the combination of the removal of the old statement and the addition of the new statement.
Assuming that the change had been the addition of a statement to the data description default model:
Java and C#:
ChangeCapture createStatements = auditTrailService.startOperation(agent, "reason for creating statements", someSoftware);
// we add a single statement
Model ddDefaultGraph = adfFile.getDataDescription();
ddDefaultGraph.add(ResourceFactory.createResource("http://example.org/res/someThing"),
ResourceFactory.createProperty("http://example.org/prop/someProperty"),
ResourceFactory.createPlainLiteral("a value"));
auditRecord = createStatements.commit();
This change is a DataUpdate
, and the added statement is part of a named graph
in the dataset of the audit record, returned as newData()
. The RDF dataset of
the audit record is retrieved with auditTrailService.getAuditRecordDataset()
. The graph that
the statement had been added to - in this case, it is the default graph -, is the target()
of the data update:
Java:
Dataset auditRecordDataset = auditTrailService.getAuditRecordDataset(auditRecordIri);
ChangeSet ddChanges = auditTrailService.getAuditRecordDataDescriptionChanges(auditRecordIri);
for (DataUpdate update : ddChanges.updates()) {
Resource targetModel = update.target();
Resource newStatements = update.newData();
Model addedStatementModel = auditRecordDataset.getNamedModel(newStatements.id().get());
Statement stmt = addedStatementModel.listStatements().next();
}
C#:
Dataset auditRecordDataset = auditTrailService.getAuditRecordDataset(auditRecordIri);
ChangeSet ddChanges = auditTrailService.getAuditRecordDataDescriptionChanges(auditRecordIri);
foreach (DataUpdate update in ddChanges.updates()) {
Resource targetModel = update.target();
Resource newStatements = update.newData();
Model addedStatementModel = auditRecordDataset.getNamedModel(newStatements.id().get());
Statement stmt = addedStatementModel.listStatements().next();
}
Now assuming that the changes had been the addition of a named graph to the data description dataset:
Java and C#:
ChangeCapture createNamedGraph = auditTrailService.startOperation(agent, "reason for creating a named graph", someSoftware);
// we create a simple named graph with a single statement
Model namedGraph = ModelFactory.createMemModelMaker().createFreshModel();
namedGraph.add(ResourceFactory.createResource("http://example.org/res/someThing"),
ResourceFactory.createProperty("http://example.org/prop/someProperty"),
ResourceFactory.createPlainLiteral("a value"));
// and add it to the data description dataset
Dataset ddSet = adfFile.getDataset();
ddSet.addNamedModel("http://example.org/example/graph", namedGraph);
auditRecord = createNamedGraph.commit();
In this case, the changeset contains a single addition:
Java:
Dataset auditRecordDataset = auditTrailService.getAuditRecordDataset(auditRecordIri);
ChangeSet ddChanges = auditTrailService.getAuditRecordDataDescriptionChanges(auditRecordIri);
for (Resource addition : ddChanges.additions()) {
Model addedModel = auditRecordDataset.getNamedModel(addition.id().get());
Statement stmt = addedModel.listStatements().next();
}
C#:
Dataset auditRecordDataset = auditTrailService.getAuditRecordDataset(auditRecordIri);
ChangeSet ddChanges = auditTrailService.getAuditRecordDataDescriptionChanges(auditRecordIri);
foreach (Resource addition in ddChanges.additions()) {
Model addedModel = auditRecordDataset.getNamedModel(addition.id().get());
Statement stmt = addedModel.listStatements().next();
}
If a named graph had been removed in an audit record, the graph is stored under a different name.
To retrieve the graph, the method getArchivedNamedModel
can be used.
The following example shows how to retrieve the model if a graph had been removed:
Java:
Dataset auditRecordDataset = auditTrailService.getAuditRecordDataset(auditRecordIri);
ChangeSet ddChanges = auditTrailService.getAuditRecordDataDescriptionChanges(auditRecordIri);
for (Resource removal : ddChanges.removals()) {
String removedModelName = removal.id().get();
Model removedModel = auditTrailService.getArchivedNamedModel(auditRecordIri, removedModelName);
}
C#:
Dataset auditRecordDataset = auditTrailService.getAuditRecordDataset(auditRecordIri);
ChangeSet ddChanges = auditTrailService.getAuditRecordDataDescriptionChanges(auditRecordIri);
foreach (Resource removal in ddChanges.removals()) {
string removedModelName = removal.id().get();
Model removedModel = auditTrailService.getArchivedNamedModel(auditRecordIri, removedModelName);
}
The audit trail api makes use of Java/C# classes that represent defined patterns of an RDF graph and work that way like a W3C shape. There are several libraries for these classes for different vocabularies made available in the data shapes projects:
Java and C#:
Person jerry = FoafModel.person("mailto:jerry.mouse@example.org")
.firstName("Jerry")
.lastName("Mouse")
.buildAll();
Person tom = FoafModel.person("mailto:tom.cat@example.org")
.firstName("Tom")
.lastName("Cat")
.knows(jerry)
.buildAll();
The two shape objects are related via the knows()
property. firstName()
and
lastName()
represent some basic literal properties. In an RDF representation, this would be:
<mailto:jerry.mouse@example.org>
a foaf:Agent , foaf:Person ;
foaf:firstName "Jerry" ;
foaf:lastName "Mouse" .
<mailto:tom.cat@example.org>
a foaf:Agent , foaf:Person ;
foaf:firstName "Tom" ;
foaf:knows <mailto:jerry.mouse@example.org> ;
foaf:lastName "Cat" .
The shape classes are often results and parameters in the Audit Trail API, but their main function is that they can be read from and written to an RDF model. The audit record is very complex, and writing/ reading it solely with the RDF APIs would be very error prone. The shape classes help here. The mapping between the shape classes in Java/C# and the RDF classes and properties is done with annotations on the Java/C# class which maps to one or more RDF classes, and with annotations on the property methods which are mapped to RDF properties.
The writing to RDF contains only non-empty, annotated properties and all the RDF types
associated with the class. When reading from an RDF model, again all rdf:type
predicates with the subject
are read and then, depending on the target class, all annotated properties on the class are read as well. The class
to perform the read/write to the RDF model is the class Object2RDF
. Writing to an RDF graph is
straightforward, but reading is more complicated. A method might declare as a parameter a generic super class, and only
at runtime, the correct subclass can be known. An application usually knows what concrete class to expect, but the generic
adapter class Object2RDF
does not, so it only reads along those predicates of the RDF model that are actually declared
in the class. In order to handle this problem, a ReadCallback
on the generic super class can be registered.
The callback will be called before and after the build with the Builder. If the application knows (or can
determine by querying the RDF model) what subclass it should instantiate instead, it can perform a read using the
subclass as target class on the current node to process. Shape classes implement multi-inheritence (which occurs in RDF models)
by implementing mixins. Thus, the instance of the super class and the just read instance of the subclass can be mixed to a
new mixin object that merges both, so that a downcast to the subclass on the returned object will not fail.
The read method of Object2RDF
also lets one define the scope on how deep the RDF graph is transitively
traversed.
The ADF First Steps "General" Example Application illustrates the Java API by one complete code example.
It is contained in the file FirstSteps.java
in the package org.allotrope.adf.firststeps
.
In analogy, there is a C# example in the file Examples\net-allotrope-adf-firststeps.sln
.
Version | Release Date | Remarks |
---|---|---|
1.0.0 | 2017-06-30 |
|
1.4.0 | 2017-10-31 |
|
1.4.3 RC | 2018-10-11 |
|
1.4.5 RF | 2018-12-17 |
|
1.5.0 RC | 2019-12-12 |
|
1.5.0 RF | 2020-03-03 |
|
1.5.3 RF | 2020-11-30 |
|